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Abstract 

We  discuss  ihe  difficulties  of  satisfying  high-assurance 
system  reguirements  without  sacrificing  system  capabil¬ 
ities.  To  alleviate  this  problem,  we  show  how  trade-offs 
can  be  made  to  reduce  the  threat  of  covert  channels. 

We  also  clarify  certain  concepts  in  the  theory  of  covert 
channels.  Traditionally,  a  covert  channel’s  vulnerabil¬ 
ity  was  measured  by  the  capacity.  We  show  why  a  ca¬ 
pacity  analysis  alone  is  not  sufficient  to  evaluate  the 
vulnerability  and  introduce  a  new  metric  referred  to  as 
the  “small  message  criterion”. 

1  Introduction 

In  this  paper  we  discuss  how  covert  channels  arise 
in  the  area  of  high-assurance  systems.  We  give  an 
overview  of  covert  channel  theory,  with  examples, 
and  advance  our  hypothesis  that  covert  channels  can 
never  be  totally  eliminated  in  many  “practical”  high- 
assurance  systems.  A  high-assurance  system  should 
perform  the  intended  tasks  of  reliability,  security,  and 
performance  as  efficiently  as  possible,  conflicts  between 
the  requirements  are  inherent. 

The  paper  is  organized  as  follows: 

•  We  show  how  reliability  and  performance  require¬ 
ments  can  undermine  efforts  at  thwarting  covert 
channels. 

•  We  look  at  covert  channels  in  terms  of  information 
theory  and  clarify  certain  concepts. 

•  We  suggest  that  a  capacity  analysis  alone  does  not 
suffice  when  dealing  with  covert  channels  and  in¬ 
troduce  a  new  metric  referred  to  as  the  “small  mes¬ 
sage  criterion” . 

•  We  discuss  trade-offs  between  covert  channel 
degradation  and  performance. 

•  We  then  discuss  our  recent  work  on  the  “pump” 
and  show  how  it  reduces  the  covert  channel  threat 
without  degrading  performance. 


2  Practical  High- Assurance  Multilevel 
Systems 

All  multilevel  systems  require  information  flow  from 
Low  to  High1.  Two  methods  of  information  flow  that 
do  not  violate  BLP  [1]  are  read-down  and  blind  write¬ 
up.  However,  these  methods  have  practical  problems 
in  terms  of  reliability  and  performance  [11]. 

As  the  computing  environment  becomes  more  sophis¬ 
ticated,  complicated  operations  are  needed  and  other 
features,  e.g.,  atomicity,  become  crucial  requirements. 
One  type  of  high-assurance  system  is  a  secure  database 
system  (DBS).  Reliability  and  atomicity2  of  transac¬ 
tions  in  a  secure  DBS  are  integral  components  of  high- 
assurance  computing.  In  the  following,  we  show  how 
difficult  it  is  to  eliminate  totally  covert  channels  in  to¬ 
day’s  sophisticated  high-assurance  computer  systems. 

Mathur  and  Keefe  [13]  showed  that  conflicts  exist  be¬ 
tween  atomicity  and  security  in  the  case  of  multilevel 
transaction  execution  [4].  In  other  words,  there  may  be 
no  concurrency  controller  that  can  schedule  multilevel 
transactions,  and  guarantee  the  atomicity  of  transac¬ 
tions  and  security  simultaneously. 

Let  us  consider  the  potential  conflicts  between  relia¬ 
bility  and  security.  In  a  DBS,  a  user/process  wishes 
to  receive  an  acknowledgement  of  a  successful  update. 
Without  acknowledgements,  necessary  data  may  be 
written  over,  or  may  be  lost  during  a  crash,  which  is 
unacceptable  in  a  high-assurance  DBS.  In  a  non-secure 
DBS  there  is  no  problem  with  acknowledgements;  how¬ 
ever  in  a  secure  DBS  acknowledgements  can  allow  a 
covert  channel  to  exist  between  High  and  Low. 

Consider  the  following  example  of  an  object-oriented 
DB  program  involving  three  objects:  (1)  EMPLOYEE, 
(2)  PAYJNFO,  and  (3)  WORKJNFO. 


1If  a  multilevel  system  does  not  require  information  flow  from 
Low  to  High  then  we  consider  it  as  two  system-high  systems. 

2  All  or  nothing  without  the  appearance  of  interruptions. 
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PAYJNFO 


EMPLOYEE  WORKJNFO 


Figure  1:  Objects  in  payroll  database. 


The  main  program  calls  the  employee  method  in  EM¬ 
PLOYEE  object  (in  C+- 1-  look-alike  pseudocode): 


employee () 

{ 

PAY.IIFO.payO ; 

WORK_IIFO.reset_weekly_h.ours() ; 

> 


The  pay  method  in  turn: 


pay() 

{ 

WORK.IIFO . get_hours ( ) ; 

> 


In  conventional  programming  sense  (i.e. ,  if  there 
is  no  parallelism  between  PAY_IIFO.pay()  and 
W0RK.IIF0  .  reset.weekly_h.ours  ( ) ),  the  correctness  of 
the  program  is  guaranteed  by 

executing  W0RK_IIF0  .  reset_weekly_hours  ( )  after 
PAY_IIFFO.pay()  is  executed  (see  figure  1). 

If  the  same  transaction  is  performed  in  a  secure  system, 
where  PAYJNFO  is  a  high  object  and  EMPLOYEE 
and  WORKJNFO  are  low  objects,  then  the  above  so¬ 
lution  is  not  acceptable  because  t.hi  acknowledgement 
(4)  from  PAYJNFO  (High)  to  EMPLOYEE  (Low)  can 
be  used  as  a  covert  timing  channel  by  PAYJNFO  mod¬ 
erating  the  time  at  which  the  acknowledgement  (4)  is 
sent  to  EMPLOYEE. 

To  overcome  this  security  problem,  Jajodia  and  Kogan 
[10]  proposed  a  message  filter  that  enforces  the  security 
policy  in  multilevel  object-oriented  systems.  Sandhu, 
Thomas,  and  Jajodia  [21]  proposed  a  covert  channel 
free  implementation  strategy  of  this  message  filter  in 
the  kernelized  architecture  [7].  The  proposed  covert 
channel  free  solution  is  as  follows  (the  heavy  blocks  in 
the  diagram  represent  the  message  filters): 


PAYJNFO 


EMPLOYEE  WORKJNFO 


Figure  2:  Proposed  solution  for  a  secure  payroll 
database. 

•  The  transaction  is  initiated  by  EMPLOYEE 
by  sending  the  PAY  message  to  PAYJNFO.  If 
PAYJNFO  can  send  an  acknowledgement  back  to 
EMPLOYEE  then  a  potential  covert  timing  chan¬ 
nel  exists.  Hence,  the  message  filter  sends  NIL 
right  away  and  blocks  any  response  from  High  to 
Low. 

•  PAYJNFO, 

in  turn,  sends  GET.HOURS  to  WORKJNFO  to 
read  Hours_worked.  WORKJNFO  should  not 
know  when  or  by  whom,  its  information  is  read3 
(if  it  knows  then  this  information  can  be  used  as 
a  covert  channel). 

•  In  the  meantime,  EMPLOYEE  sends  RE¬ 
SET-WEEKLY-HOURS  to  WORKJNFO  to  reset 
Hours_worked.  WORKJNFO  can  send  DONE  to 
EMPLOYEE  because  they  are  at  the  same  level. 

•  To  guarantee  that  GET-HOURS  reads  the  value  of 
Hours_worked  before  RESET-WEEKLY-HOURS 
is  executed,  WORKJNFO  uses  a  multiple  version 
scheme.  In  other  words,  WORKJNFO  always 
makes  a  new  version  whenever  its  information  is 
updated  so  that  High  can  read  appropriate  (but 
potentially  old)  versions  of  Hours_worked. 

Even  though  the  above  solution  is  covert  channel  free, 
it  has  a  few  practical  problems.  Let  us  consider  these 
problems: 

•  Since  the  computer  resources  are  limited,  some 
versions  have  to  be  deleted  from  the  system  af¬ 
ter  some  time  has  passed  (i.e.,  garbage  collec¬ 
tion).  Therefore  we  cannot  keep  all  old  versions 
of  Hours  .worked  and  PAYJNFO  might  not  read 
the  correct  version  of  Hours  .worked. 

•  When  PAY  is  sent  to  PAYJNFO,  the  message  fil¬ 
ter  sends  NIL  right  away.  Hence,  EMPLOYEE 

"’There  is  some  controversy  over  how  High  can  send  read-only 
messages  to  Low  without  revealing  its  activity.  However,  in  this 
paper,  we  assume  that  there  is  a  covert  channel  free  implemen¬ 
tation  to  send  read-only  messages  to  Low. 
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cannot  confirm  whether  PAYJNFO  actually  re¬ 
ceived  a  message  or  not — it  just  hopes  that  the 
message  arrived  safely  at  the  destination.  If,  some¬ 
how,  PAYJNFO  is  not  ready  to  receive  a  message 
or  the  message  passing  system  has  some  problems 
then  the  message  may  never  get  to  PAYJNFO, 
and  EMPLOYEE  would  never  know  what  hap¬ 
pened  to  the  message.  If  EMPLOYEE  knew  that 
the  message  was  not  delivered  to  PAYJNFO,  e.g., 
by  not  receiving  the  DONE  message  as  in  the 
non-secure  version,  it  could  repeat  the  message  or 
abandon  the  task  without  resetting  Hours_worked. 

•  Even  if  the  message  eventually  gets  to  PAYJNFO, 
the  correct  version  of  Hours  .worked,  may  be 
deleted  already  because  WORKJNFO  does  not 
know  when,  or  by  whom,  the  data  is  read. 

We  have  shown  that  it  is  difficult  to  obtain  reliabil¬ 
ity  and  atomicity  in  conjunction  with  security.  Infor¬ 
mation  theory  can  quantify  some  of  these  difficulties 
involving  covert  channels. 

3  Information  Theory  Background 

In  brief,  a  covert  channel  is  a  communication  channel 
that  exists,  contrary  to  design,  in  a  computer  system. 
There  are  three  historical  patterns  of  covert  channel 
analysis  that  concern  us.  The  first  is  the  use  of  the 
term  bandwidth  [6]  instead  of  capacity,  the  second  is 
ignoring  the  encoding/decoding  process,  and  the  third 
is  using  capacity  as  the  total  measure  of  insecurity, 
instead  of  including  factors  such  as  the  length  of  the 
message  or  the  quality  of  the  message.  We  will  not 
concentrate  on  the  second  issue  in  this  paper. 

Covert  channel  analysis  is  just  a  subset  of  information 
theory.  Information  theory  is  concerned  with  send¬ 
ing  signals  from  a  transmitter  to  a  receiver,  with  the 
possibility  of  noise  degrading  the  signal  fidelity.  This 
process  of  transmission  is  a  communication  channel  or 
simply  a  channel.  In  general,  the  transmitter  takes 
a  message  and  encodes  it  before  it  transmits  it.  The 
receiver,  once  it  receives  the  message,  decodes  the  mes¬ 
sage.  The  brilliance  of  Shannon’s  work  is  that  it  gives 
an  upper  limit  on  the  rate  at  which  messages  can  be 
passed,  within  a  certain  given  error  tolerance,  through 
the  communication  channel  —  by  the  process  of  en¬ 
coding,  transmitting,  receiving,  and  decoding  —  based 
solely  on  how  noise  affects  the  transmission  of  the  sig¬ 
nals.  This  upper  limit  is  referred  to  as  the  capacity  of 
the  channel. 

The  inputs  from  the  transmitter  to  the  channel  con¬ 
stitute  the  input  alphabet.  Sending  an  input  letter 
across  the  channel  is  synonymous  with  sending  a  sym¬ 
bol  across  the  channel.  The  interpretation  of  the  re¬ 
ceived  symbol  by  the  receiver,  prior  to  any  decoding, 
is  what  constitutes  the  output  alphabet.  If  both  the 
input  and  output  alphabets  are  discrete  we  have  a  dis¬ 
crete  channel,  which  is  usually  the  case  in  covert  chan¬ 
nel  analysis.  The  choice  of  alphabets  often  approxi¬ 
mates  the  actual  physical  process.  This  is  especially 


true  when  the  output  alphabet  is  made  up  of  time  val¬ 
ues,  e.g.,  [18].  In  general,  the  alphabets  need  not  have 
anything  in  common.  However,  if  no  noise  exists  in  the 
channel  then  what  the  transmitter  puts  in  is  what  the 
receiver  gets  out,  and  thus  the  alphabets  are  identical. 

For  the  sake  of  convenience,  the  standard  unit  of  time 
is  a  tick  (t).  Capacity  may  be  measured  in  both  units  of 
bits  per  channel  usage  (C'u)  or  in  units  of  bits  per  tick 
(C't).  If  every  symbol  takes  the  same  amount  of  time  r 
to  be  sent  across  the  communication  channel  ( constant 
time  channel),  then  Ct  =  t~1Cu  .  Therefore,  without 
loss  of  generality,  we  may  use  either  measurement  of 
capacity  for  a  constant  time  channel.  In  the  literature, 
the  meaning  is  usually  made  clear  by  examining  the 
units  in  which  capacity  is  expressed. 

A  storage  channel  is  a  covert  channel  where  the  output 
alphabet  consists  of  different  responses  all  taking  the 
same  time  to  be  transmitted.  A  timing  channel  is  a 
covert  channel  where  the  output  alphabet  is  made  up 
of  different  time  values  corresponding  to  the  same  re¬ 
sponse.  A  mixed  channel1  is  a  combination  of  the  two. 
Even  though  our  definition  of  storage  channel  is  not  de 
jure  identical  to  Lampson’s  [12],  it  is  de  facto  the  same. 
Our  definitions  capture  the  operational  differences  be¬ 
tween  storage,  timing,  and  mixed  channels. 

For  example,  one  type  of  of  storage  channel  is  given  by 
Low  requesting  a  resource  and  receiving  the  (constant 
time)  reply  that  the  resource  can  be  used  or  that  the 
resource  cannot  be  used.  An  output  alphabet  of  two 
distinct  symbols  results.  In  contrast,  a  timing  channel 
is  one  where  Low  always  receives  a  response  that  the 
resource  is  available  for  its  use  but  receives  the  response 
at  different  times.  Section  2  was  concerned  with  a  tim¬ 
ing  channel.  In  section  5  we  discuss  a  storage  channel. 

For  a  timing  (or  mixed)  channel,  the  capacities  Cj  and 
Cu  no  longer  differ  by  a  constant  multiple.  Let  us  first 
show  how  to  calculate  Cu.  We  will  work  with  chan¬ 
nels  that  are  memoryless,  unless  otherwise  noted.  By 
memoryless  we  mean  that  there  are  no  restrictions  on 
what  symbol  may  be  transmitted  based  upon  the  prior 
history  of  the  channel.  Each  transmission  is  indepen¬ 
dent  of  the  past  transmissions  and  they  are  not  time- 
varying.  We  will  mostly  look  at  channels  that  are  both 
discrete  and  memoryless  (DMC).  This  is  not  a  serious 
constraint  because  the  capacity  of  a  channel  with  mem¬ 
ory  can  often  be  bound  from  above  by  a  memoryless 
version  of  the  channel  [11]. 

For  a  random  variable  X,  X  =  X{,  let  H( X)  denote  the 
entropy  of  X .  The  entropy  measures  the  “information” 
or  “surprise”  of  the  different  values  of  X.  For  a  par¬ 
ticular  value  x  the  surprise  is  —  log_P(;r);  if  Xi  happens 
with  certainty  then  its  surprise  is  zero,  and  if  Xi  never 
occurs  then  its  surprise  is  maximal  at  infinity.  We  al¬ 
ways  use  the  base  two  logarithm  so  that  the  units  of 
information  are  in  bits.  The  entropy  is  the  expected 
value  of  the  information  of  X  and  has  units  of  bits  per 

4  This  is  our  own  terminology  and  we  will  not  concentrate  on 
these  types  of  covert  channels  in  this  paper. 
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outcome.  For  the  ease  of  notation  we  will  express  the 
event  ( X  =  xi)  as  (xt).  Thus 

H(X)  =  -YJP(xi)\ogP(xi)  . 


If  X  is  the  random  variable  representing  the  input  to 
a  channel  the  unit  outcome  of  X  is  synonymous  with 
the  unit  channel  usage. 

Information  theory  is  concerned  with  how  the  input  or 
transmission  entropy  changes  while  it  travels  through 
the  channel.  If  the  channel  is  noiseless  then  the  amount 
of  information  in  a  transmission  should  be  unchanged. 
However,  if  there  is  noise  in  the  channel  then  the  fi¬ 
delity  of  the  signal  is  degraded  and  the  information 
sent  is  diminished.  If  the  channel  noise  is  so  great 
and  all  encompassing  then  there  is  no  more  surprise  in 
seeing  any  one  symbol  over  another.  This  is  mathe¬ 
matically  modeled  by  the  equivocation  or  conditional 
entropy  H (X  \  Y) ,  where  X  is  the  random  variable  rep¬ 
resenting  the  channel  input  and  Y  is  random  variable 
representing  the  channel  output.  Mathematically 

h(x\y)  =  J2H(x\ys)p(ys) 

3 

=  P(yj)P(Xi  I  %')  lo8  P(xi  I  %')  • 

i,j 

A  good  way  to  understand  H(X  \  Y)  is  to  look  at  the 
extreme  cases;  H(X  \  X)  =  0  and  H(X  \  Y)  =  H(X) 
if  X  and  Y  are  independent. 

For  a  DMC  the  noise  is  expressed  by  the  conditional 
probabilities  pij  =  P(yj  \  Xi).  Thus  we  see  how  the  dis¬ 
tributions  for  X  and  the  noise  totally  determine  both 
Pt(X)  and  H(X  \  Y).  The  mutual  information  in  units 
of  bits  per  channel  usage  IU(X,  Y)  measures  how  much 
information  is  actually  sent  across  the  channel  from  in¬ 
put  X  to  receiver  Y .  The  mutual  information  in  units 
of  bits  per  channel  usage  is  the  difference  between  the 
input  entropy  and  the  conditional  entropy: 

Iu(X,Y)  =  H(X)-H(X  |  Y)  . 

When  transmitting,  the  transmitter  can  do  nothing 
about  the  noise,  and  the  receiver  is  passive  and  waits 
for  symbols  to  be  passed  over  the  channel.  However, 
the  transmitter  can  send  different  symbols  with  differ¬ 
ent  frequencies;  thus,  there  are  different  distributions 
for  X .  By  changing  the  frequency  of  the  symbols  sent, 
the  transmitter  can  affect  the  amount  of  information 
sent  to  the  receiver.  Cu  is  the  maximum  amount  of  in¬ 
formation,  in  units  of  bits  per  channel  usage,  that  can 
be  sent  over  the  DMC: 

Cu  =  maxIu(X,Y)  , 

where  the  maximum  is  taken  over  the  different  distri¬ 
butions  on  X. 


Let  ti  be  the  amount  of  time  to  transmit  input  letter  Xi 
across  the  DMC.  The  random  variable  T  is  the  time  to 
send  a  symbol  and  the  distribution  of  T  is  determined 
by  the  distribution  of  X.  The  mean  of  T,  in  units  of 
ticks  per  channel  usage,  is  represented  by  E(T).  The 
mutual  information  in  units  of  bits  per  tick  It(X,Y) 
for  a  DMC  is 


It(X,Y) 


Iu(X,Y) 

E(T) 


The  capacity  in  units  of  bits  per  tick  for  a  DMC  is 
given  by  [23] 


Ct 


=  max 


In(X,Y) 

E(T) 


(1) 


maximized  as  before.  Of  course,  if  this  is  a  constant 
time  DMC  the  value  E(T)  is  distribution  independent 
and  we  have  our  previous  formula  Ct  =  t~1Cu  ,  where 
r  =  E(T)  .  Note  that,  in  general,  Ct  is  not  m^JxUE^T)  ^ ; 
see  [15]. 

If  we  have  a  timing  DMC  that  is  also  noiseless  then 
we  refer  to  it  as  a  simple  timing  channel  (STC).  These 
have  been  studied  in  [20].  For  a  STC,  IU(X,Y)  is  sim¬ 
ply  H(X)  so 

Ct  =  max 

(For  timing  channels  Cu  is  not  a  useful  concept  and 
Ct  is  understood.)  Even  for  this  very  trivial  type  of 
timing  channel,  an  exact  calculation  of  capacity  is  dif¬ 
ficult  .  The  problem  is  analogous  to  finding  roots  of  a 
polynomial —  an  easy  task  if  you  do  it  numerically  but 
a  very  difficult  task  if  you  require  closed  form  solutions 
for  the  roots  [16,  20].  In  general,  for  timing  and  mixed 
channels,  the  capacity  analysis  is  quite  difficult. 


Capacity  Yes,  Bandwidth  No 

Now  let  us  leave  the  arena  of  covert  channels  and  just 
look  at  one  very  complicated  but  important  type  of 
communication  channel.  The  reason  for  this  is  to  drive 
home  the  point  that  we  should  not  use  the  term  band¬ 
width  or  maximum  bandwidth  for  capacity.  Our  com¬ 
munication  channel  is  still  memoryless  but  it  is  no 
longer  discrete.  The  input  is  a  continuous  signal  f(t) 
with  bandwidth  W .  By  this  we  mean  that  the  Fourier 
Transform  F(uj)  of  f(t)  is  zero  for  u>  >  W.  Further¬ 
more,  the  signal  has  an  average  power  P.  The  output 
of  this  channel  is  the  sum  of  the  input  signal  with  in¬ 
dependent  white  noise  of  power  N .  Shannon  showed 
[22]  that 

Ct  =  FFlog  Tl  +  — ^  . 


Therefore,  we  see  that  it  is  wrong  to  refer  to  band¬ 
width  or  maximum  bandwidth  as  capacity  (e.g.,  [19], 
[6])  because  the  bandwidth  is  a  separate  characteristic 
of  a  continuous  channel  and  the  capacity  is  in  fact  a 
function  of  the  bandwidth!  We  should  not  reinvent  the 
wheel  and  use  the  standard  terminology  that  already 
exists. 
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4  The  Small  Message  Criterion 

Let.  us  consider  discrete  communication  channels  that 
are  noiseless  but  not  necessarily  nrenroryless.  Hence 
a  symbol  might  take  a  longer  amount  of  time  to  be 
transmitted  in  the  future  than  at  other  times,  or  cer¬ 
tain  symbols  may  not  be  transmitted  after  other  sym¬ 
bols  (run-length  limited  and/or  time  varying  channels 
).  Since  the  channel  is  noiseless  the  symbol  transmit¬ 
ted  is  the  symbol  that  is  received.  Each  symbol  that  is 
transmitted  takes  an  amount  of  time  to  be  sent.  For  a 
particular  time  value  n,  let  Sn  represent  all  of  the  al¬ 
lowed  messages  that  take  time  n  to  be  transmitted.  If 
the  channel  is  nrenroryless  then  Sn  consists  of  all  pos¬ 
sible  symbol  sequences  whose  total  transmission  time 
is  n,  see  [20].  If  the  channel  has  memory  then  we  can 
only  use  the  symbol  sequences  allowed  by  the  param¬ 
eters  of  the  communication  channel.  Let  1 |  be  the 
magnitude  of  the  set  Sn.  Shannon’s  original  definition 
of  capacity  for  such  a  channel  (he  used  the  ordinary 
limit)  is  given  by 


r,  y  log  IS/  | 

C t  —  iimsup -  .  (2) 

n— kx)  Tl 

Note  for  a  STC  the  above  Eq.  (2),  by  [20,  Thnr.  2], 
reduces  to  Eq.  (1). 

Zero  Capacity  Example 

Assume  we  have  a  noiseless  communication  channel 
with  two  symbols.  The  first,  time  the  channel  is  used 
either  symbol  can  be  sent,  in  1  tick,  the  second  time  the 
channel  is  used  either  symbol  can  be  sent,  in  2  ticks,  the 
third  time  either  symbol  can  be  sent,  in  4  ticks,  ...  ,  the 
nt.lr  time  either  symbol  can  be  sent,  in  2n_1  ticks. 


1  bit 

I  - 1 

I I 


1  bit 


2  I 


Ibit 


4 1 


1  bit 


St 


Figure  3 


We  see  that  by  the  nt.lr  transmission  there  are  2n  differ¬ 
ent.  messages  and  that  the  total  transmission  time  by 
the  nt.lr  transmission  is  1  +  2  +  •  •  •  +  2n_1  ticks.  There¬ 
fore  (after  some  analysis)  the  capacity  of  this  channel 
is 


Ct  =  linr 

n— *oo 


log  2" 


Vn_1  9* 
Mi  =  0  z 


linr 


2"  -  1 


=  0  . 


However,  we  can  send  any  message  we  want,  with  abso¬ 
lutely  no  loss  of  fidelity  across  this  channel!  Of  course 
as  the  number  of  bits  that  we  wish  to  transmit,  grows 
polynonria.lly,  the  transmission  time  grows  exponen¬ 
tially  (this  is  wlra.t.  the  capacity  tells  us).  However, 
if  we  have  a.  small  message,  then  who  cares  wlra.t.  the 
capacity  is?  In  this  example  we  can  noiselessly  send  a. 
4  bit.  message  in  15  ticks.  Knowing  that  the  capacity  is 
zero  does  not.  tell  us  that  we  are  in  a.  secure  situation. 


The  lesson  learned  from  the  above  example  is  an  impor¬ 
tant.  one  and  its  ideas  have  been  discussed  before  [17]. 
If  one  has  a.  very  sensitive  but.  short,  message  then  the 
capacity  is  not.  a.  sufficient,  measure  of  security.  Also  we 
have  previously  discussed  examples  such  as  these  with 
Wit.t.bold  [24]. 

There  are  many  other  examples  of  this  type.  Say  we 
have  a,,  channel  that  noiselessly  transmits  100  bits  in 
the  first,  tick  and  then  is  total  noise.  The  capacity  of 
this  channel  is  zero  but.  the  problem  is  obvious.  We 
need  to  develop  a.  criterion,  or  criteria.,  in  addition  to 
capacity,  that  should  be  required  of  secure  systems. 
Capacity,  since  it.  is  an  asymptotic  definition,  is  fine  if 
we  are  concerned  with  sending  very  long  files  over  a. 
long  period  of  time.  Then  capacity  gives  us  the  exis¬ 
tence  of  a.  (possibly  very  complicated  and  long)  code 
that  will  send  messages  at  a.  certain  level  of  fidelity  at 
a.  certain  rate. 

If  our  message  is  short,  then  we  need  to  know  what,  level 
of  covert,  transmission  will  be  tolerated.  Three  factors 
should  be  taken  into  account..  The  first,  is  the  length 
of  the  message  —  how  many  bits?  The  second  factor 
is  the  fidelity  of  the  message  —  is  there  a.  threshold 
below  which  the  degraded  short,  message  is  no  longer 
a.  security  threat?  The  third  factor  is  the  time  frame 
—  if  our  concern  is  of  a.  10  bit.  message  do  we  care  if  it. 
takes  a.  second  or  a.  week  for  that,  message  to  be  trans¬ 
mitted?  These  three  factors  make  up  part,  of  wha.t.  we 
envision  as  a.  small  message  criterion  (SMC).  The  SMC 
should  reflect,  the  following  viewpoint,  and  depends  on 
the  triple  (n,  t,  p)  : 

When  a  covert  channel  exists  in  a  system,  the  SMC 
will  give  guidelines  for  what  will  be  tolerated  in  terms 
of  covertly  leaking  a  short  covert  message  of  length  n 
bits  in  time  t  with  fidelity  of  transmission  p%.  The 
SMC  must  be  used  in  conjunction  with  capacity  for  a 
full  security  analysis/validation  of  a  system. 

The  SMC  can  itself  be  dynamic,.  If  the  sensitivity  of  the 
messages  goes  up,  the  SMC  can  be  tightened  up  so  we 
can  have  t.ra.de-offs  between  security  and  performance. 
In  other  words,  distinctions  should  be  made  between 
High  and  Very  High,  or  Critical  [2]. 

We  note  that  previous  formal  models’  work  have  at¬ 
tempted  to  capture  ideas  similar  to  ours.  The  various 
information  flow  models  such  as  FM  and  AFM  have 
concerned  themselves  with  how  Low  probabilities  are 
independent./dependent.  of  High  [14,  9].  A  particularly 
interesting  formal  model  has  been  put.  forth  by  Browne 
in  his  Zero  Information  Finite  Sample  Theorem  [3].  His 
theorem  captures  the  information  theoretic  essence  of 
sending  a.  message  for  a.  limited  amount,  of  time.  We 
believe  that  designing  a.  system  that  satisfies  a.  model 
is  at  least,  as  important,  as  building  the  model  itself. 
In  future  work  we  plan  to  give  system  designers  actual 
t.ra.de-offs  between  performance  and  security  concerns. 

5  Tradeoffs 

A  communication  theorist,  is  concerned  with  how  to 
send  as  many  bits  as  possible  through  a.  communication 
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channel.  A  secure  system  designer,  on  the  other  hand, 
wants  to  maximize  security  without  ending  up  with  a 
secure  brick.  We  must  examine  how  security  impacts 
upon  performance. 

The  needs  for  lowering  capacity  must  be  balanced 
against  the  needs  of  performance.  As  we  have  dis¬ 
cussed,  the  SMC  presents  us  yet  another  obstacle  to 
performance,  in  that  lowering  capacity  alone  does  not 
eliminate  the  threat  of  covert  channels. 


Figure  4:  Covert  Channel  Concerns 

We  have  seen  how  the  need  for  assurance,  via  acknowl¬ 
edgements  in  a  DBS,  leads  to  a  timing  channel.  In 
section  5.1  we  will  examine  another  instance  of  how 
assurance,  via  an  acknowledgement,  leads  to  a  storage 
DMC  in  a  DBS.  We  will  then  quantify  how  methods 
of  lowering  covert  channel  capacity  lead  to  a  decline  in 
performance.  We  will  not  go  into  specific  instances  of 
how  the  SMC  can  affect  performance. 

Let  us  examine  Eq.  (1)  more  carefully.  We  may  express 
Ct ,  for  a  DMC,  as 


Ct  = 


H( X)  -  H( X  |  Y) 

W) 


We  see  that  there  are  two,  not  necessarily  independent, 
ways  of  lowering  capacity.  We  can  either  try  to  increase 
the  times  that  it  takes  for  symbols  to  pass  over  the 
channel — thus  decreasing  1  /E{T),  or,  we  can  increase 
the  noise,  which  will  increase  H(X  |  Y),  which  will 
in  turn  decrease  IU(X,Y).  If  we  did  not  care  about, 
performance  this  would  be  wonderful.  Unfortunately, 
we  do  care  about,  performance. 

In  section  5.1  we  discuss  an  approach  to  increasing 
noise  in  a.  storage  channel,  which  has  the  unfortunate 
effect,  of  lowering  performance.  In  section  5.2  we  dis¬ 
cuss  our  recent,  work  on  a.  way  of  lowering  capacity  and 
meeting  the  SMC,  for  a.  timing  channel,  without,  sacri¬ 
ficing  performance  by  using  the  “pump” . 


5.1  Two  Phase  Commit  Protocol 

A  multilevel  secure  replicated  architecture  database 
system  (MLS-RA  DBS)  using  the  two  phase  commit, 
protocol  (2PC)  for  atomic  commitment,  results  in  a. 
storage  channel.  This  is  no  surprise  and  its  mathemat¬ 
ical  details  have  been  studied  in  [5].  We  will  briefly 
summarize  a.  simple  idealized  version  of  it.  here. 

In  a.  MLS-RA  DBS,  copies  of  lower  data,  are  retained  in 
replicated  higher  copies.  When  a.  particular  low  user 
(we  will  use  the  term  user  for  any  user  or  process) 
designated  as  Low,  wishes  to  update  a.  data,  item  all 
of  the  higher  copies  have  to  agree  to  the  update  via. 
a.  commit.,  if  even  one  of  them  aborts  then  the  low 
data,  item  is  not.  updated  and  the  Low  user  receives  an 
abort..  All  of  the  abort./commit.  voting  is  moderated 
through  a.  trusted  front,  end.  This  prevents  Low  from 
knowing  how  the  other  users  voted.  However,  if  there 
is  onj  particular  high  user,  designated  as  High,  that 
wishes  to  communicate  covertly  with  Low,  it.  may  do 
so  through  this  scheme.  Assume  there  are  (n  +  2)  users; 
Low,  High,  and  n  other  users  with  levels  higher  than 
Low.  A  Trojan  horse  is  in  place  so  that.  Low  and  High 
can  always  commit,  or  abort,  as  they  wish.  The  covert, 
communication  works  as  follows: 

Low  wishes  to  update  the  DBS.  If  Low  receives  a.  com¬ 
mit.  then  it.  interprets  that,  as  a.  0;  if  Low  receives  an 
abort,  then  it.  interprets  that,  as  a.  1.  High  can  send 
a.  1  to  Low  with  no  noise  simply  by  voting  to  abort.. 
Therefore  P(Loiu  =  1  |  High  =  1)  =  1.  However,  if 
High  wishes  to  send  a.  0  to  Low  by  voting  to  commit, 
then  there  is  the  possibility  that,  one  of  the  other  users 
might,  vote  to  abort..  Hence,  the  transmission  of  the  0 
is  noisy.  We  assign  the  probability  p  to  another  user 
aborting,  and  assume  that,  everything  takes  one  tick, 
where  a.  tick  is  the  standard  unit,  of  time,  so  Cu  =  C't 
.  (The  actual  question  of  delaying  t.hS>. votes  is  more 
complicated  and  will  not.  be  looked  at  here.) 


Low 

0 


1 


1 


Figure  5:  Z-Cha.nnel 

This  set-up  forms  what,  is  known  as  a.  Z-cha.nnel  [8]. 
P(Loiv  =  0  |  High  =  0)  =  (1  —  p)n  ,  since  all  of 
the  other  n  users  must,  vote  to  commit.;  High  sending 
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a  0  and  Low  getting  a  1  comes  about  as  the  comple¬ 
ment.  of  all  of  the  other  users  voting  to  commit,  thus 
P(low  =  1  |  High  =  0)  =  1  —  (1  —  p)n  .  As  a  secu¬ 
rity  counter-measure,  the  system  itself  could  increase 
p  (increase  the  noise)  to  lower  capacity.  In  what  fol¬ 
lows  we  show  that  this  has  the  deleterious  side  effect 
of  lowering  performance. 

If  there  is  no  Trojan  horse  present  then  we  want  to 
examine,  as  a  measure  of  performance,  how  long  Low 
has  to  wait.,  on  the  average,  for  a.  global  commit..  Since 
there  are  no  longer  any  Trojan  horses  present.,  all  (n+ 2) 
users  abort,  with  probability  p.  Let.  t  be  the  random 
variable  representing  how  many  times  Low  must,  try 
to  commit,  an  update.  P(t  =  k),  k  a  positive  integer 
representing  the  number  of  ticks,  is  given  by 


P(T  =  k)=  (l-(l-p)"  +  2)l'-1(l-p)"  +  2 


Therefore  t  is  a.  geometric  random  variable  and  its 
mean  is  E(t)  =  ( 1_p1)n+2  •  Thus  as  p  increases  so  does 

E(t)  (hence  lowering  performance),  however  p  increas¬ 
ing  causes  C't  t.o  decrease  (hence  increasing  security). 


Hence,  we  see  that  there  is  a.  t.ra.de-off,  as  the  two  fol¬ 


lowing  plots  demonstrate. 


Figure  6  is  a.  plot,  of  E(t)  for  n  =  13.  Figure  7  is 
a.  dimensionless  plot,  of  C't  and  1  / E(t)  for  n  =  13, 
and  it.  shows  the  striking  relationship  of  diminishing 
capacity  to  diminishing  performance.  The  calculations 
for  capacity  can  be  found  in  [8]  and  [5]. 


Figure  6:  mean  wait.,  n=13 


bottom  plot=capacity,  top  plot=  inverse  mean  wait,  n=13 
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Figure  7:  lowering  capacity  lowers  performance 

We  will  not.  address  the  SMC  for  this  2PC  covert,  stor¬ 
age  channel. 

5.2  The  “pump” 

The  purpose  of  the  pump  is  to  provide  a.  reliable,  re¬ 
coverable,  and  qua.si-secure  low-t.o-high  communication 
scheme.  We  summarize  from  an  earlier  paper  of  ours 
[11].  The  pump  is  designed  to  be  qua.si-secure  because 
it.  does  not.  sacrifice  performance  when  Low  and  High 
response  times  are  approximately  equal.  The  pump 
works  as  follows: 

•  A  low  process  (Low)  sends  a.  message  t.o  the  pump 
with  the  address  of  its  (high)  destination.  If  Low 
receives  an  act  then  Low  assumes  that  the  message 
will  be  safely  delivered  t.o  its  destination  and  sends 
the  next,  message. 

•  When  the  pump  receives  a.  message  from  Low,  it. 
stores  the  message  in  its  buffer  and  sends  an  ack 
to  Low. 

•  The  pump  delivers  messages  that  are  stored  in  its 
buffer  to  the  proper  destinations.  When  there  is 
an  act  from  the  destination,  the  pump  knows  that 
the  message  has  been  safely  delivered  t.o  its  desti¬ 
nation  and  deletes  the  message  from  its  buffer. 

Since  the  timing  of  acks  to  Low  can  be  used  as  a. 
covert,  timing  channel,  the  pump  introduces  random 
noise  when  it.  sends  an  ack  to  Low.  This  random  noise 
is  a.  function  of  a.  chosen  probability  distribution  and 
the  moving  average  of  ack  times  from  destinations  to 
the  pump. 
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The  pump  does  not  sacrifice  performance  to  lower  ca¬ 
pacity.  Another  interesting  aspect  of  the  pump  is  that 
it  is  sensitive  to  the  SMC.  We  have  shown  [11,  Sec.  5.1] 
how  the  construction  of  the  pump  makes  it  very  diffi¬ 
cult  to  send  even  the  smallest  of  messages  in  a  small 
amount  of  time.  The  use  of  the  distribution  parame¬ 
ters  gives  the  system  designer  control  parameters  that 
can  be  adjusted  to  meet  different  security  criteria.  We 
feel  that  this  is  a  fruitful  avenue  for  future  work. 

6  Conclusions 

In  this  paper  we  presented  our  position  that  covert 
channels  can  never  be  totally  eliminated  from  high- 
assurance  computing  systems.  We  discussed  two  major 
misconceptions  in  the  theory  of  covert  channels.  The 
first  is  the  use  of  the  term  bandwidth  and  the  second 
is  the  reliance  on  capacity  alone  as  a  measure  of  covert 
channel  vulnerability.  We  then  introduced  the  small 
message  criterion  as  a  way  of  supplementing  capacity 
to  evaluate  the  covert  channel  threat. 

Finally,  we  discussed  ways,  e.g.,  the  pump,  of  minimiz¬ 
ing  the  threat  of  covert  channels  without  drastically 
reducing  performance. 
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